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Acronyms 



Brookhaven National Laboratories (BNL) 
commercial-off-the-shelf (COTS) 
device under test (OUT) 

Electrical and Electronics Engineers (IEEE) 
field programmable gate array (FPGA) 
integrated circuits (ICs) 
intellectual property (IP) 

Joint Electron Devices Council (JEDEC) 
Joint Test Action Group JTAG 1149.1 (JTAG) 
power-on-reset (POR) 

Radiation Effects Data Workshop (REDW) 
single event burnout (SEB) 
single event effects (SEE) 
single event functional interrupt (SEFI) 


single event transient (SET) 

Single Event Upset Test Facility (SEUTF) 
single-event latch-up (SEL) 
test access port (TAP) 
windowed shift register (WSR) 
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Abstract 


• Electronic parts (integrated circuits) have grown 
in complexity such that determining all failure 
modes and risks from single particle event 
testing is impossible. 

• In this presentation, the authors will present 
why this is so and provide some realism on 
what this means. 


It’s all about understanding actual risks 
and not making assumptions. 
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Introduction 


• Device complexity has increased the challenges related 
to radiation single event effects (SEE) testing. 

- Obtaining appropriate test coverage and understanding of 
the response of biiiion-transistor commerciai devices, for 
exampie, are a concern for every tester. 

• This is akin to test vector coverage - have we stimulated 
sufficient nodes (or states) during our SEE test to understand 
risk properly? 

• We present three tenets for SEE testing to consider: 

- Tenet 1: All SEE test data are “good” data; 

- Tenet 2: Not all test sets/methods are appropriate or 
complete; and, 

- Tenet 3: Not all interpretation and analysis of SEE data are 
accurate. 

• Each of these tenets will be discussed in turn with two 
related technical diatribes included. 
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Diatribe 1: SEFIs - Definitions 



• JEDEC JESD89A* Definition 

- “A soft error that causes the component to reset, lock- 
up, or otherwise malfunction in a detectable way, but 
does not require power cycling of the device (off and 
back on) to restore operabiiity, uniike single-event 
latch-up (SEL), or result in permanent damage as in 
single event burnout (SEB).” 

• An example is an SEU in a controi register changing 
operationai modes of a device. 

• Modern integrated circuits (ICs) are not that 
straightforward (see next chart) 

*Joint Electron Devices Council (JEDEC) - 
Measurement and Reporting of Alpha Particle and 
Terrestrial Cosmic Ray-Induced Soft Errors in Semiconductor Devices 
(note: soft errors are terrestrial version of single event upsets (SEUs)) 
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Diatribe 1 - SEFIs? 



• Are these SEFIs? 

- An SEU in hidden circuitry 

• May not change apparent device operation, but is 
observed via changes in power consumption (power 
cycie may be required to recover), 

- A single event transient (SET) in a power-on-reset 
(POR) circuit that power cycles/resets the device 

• Problem clears itself, but there is down time and to-be- 
determined operating state after recovery, 

- An SEU that latches in a redundant (weak or flawed) 
row/column in a memory array 

• May not be recoverable by power reset, or 

- An SEU in a security block 

• Device may continue working, but user’s ability to 
change modes may be disabled. 

• We’d say YES! 
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Diatribe 1 : SEFI - The Term 



• Originally coined in the mid-1990s by Gary Swift 
(then at Jet Propulsion Laboratories) to 
describe a class of single event upsets (SEUs) 
(or a propagated SET) that causes a functional 
“hiccup” to occur and may be “soft” (can be 
cleared be reprogramming, restarting, or other 
non-power cycling means) or “hard” (requires 
power cycle). 

- Operational changes would be included as well as 
those “non-operational” changes like current creep. 

- This is a more general description than the JEDEC 
definition. 
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ASEFI Example (1) 


• The figure below illustrates a step load increase 
in the power consumption (supply current) that 
occurred during an SEU test on a field 
programmable gate array (FPGA) device (Katz, et 
al). 

- Single event latchup (SEL) is often assumed when 
power increased as observed. 


Device configuration also was altered during the event. 
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A SEFI Example (2) 

• SEU event was associated with the built in test 
circuit (Electrical and Electronics Engineers 
(IEEE) Joint Test Action Group JTAG 1149.1 
(JTAG) Test Access Port (TAP) controller as 
illustrated below. 



TCK 


TDI 
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A SEFI Example (3) 



• The bottom line is that the observational line 
between a SEFI and SEL can be very blurry. 

• Without a true understanding of the device’s 
operation (for both that which is accessible to 
the user and that which isn’t) as well as a 
maximization of visibility by the test set/method, 
understanding and classifying an event may be 
problematic. 
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Tenet 1: The Data are Always “Good” 

In short, data is just data. 


- It is what was observed and captured during an SEE 
test. 


- Now the question becomes are the data captured 
complete, appropriate, and interpreted correctly? 

• Think of the questions this brings into play: 

- Have all data points been captured? (adequate and 
reliable data capture), 

- Was the test prognostic enough to gather the right 
range of data (think of the simple SET capture from an 
operational amplifier- was the minimum pulse 
width/amplitude sensitivity of your osciiloscope set 
appropriateiy)? (appropriate test set granularity); or, 

- Have all the right test vehicies been used to generate 
that data? (adequate test circuits/operation) 

The point is simple: the data are correct, but there’s either not enough of it or 

insufficient granularity of information. 

The simple takeaway is that testing requires a look far below the surface... 
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Tenet 2: The Test (1) 



• The first complication comes from the way the 
device under test (OUT) is tested and the way 
data capture was performed. 

• The general idea is to focus on prognostic testing 
- ensuring that your test design is inquisitive 
enough to capture all available information on an 
event. 

• We will define design visibility as ensuring that 
the interface between what the OUT is doing and 
how the test system is operating is adequate to 
capture all relevant event information. 
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Tenet 2: The Test (2) 

• While this presentation isn’t a “how-to-test” 
document, it does recommend a thought process 
on what needs to be thoroughly considered in 
advance of test. 

• An example would be having a high-speed logic 
string, such as a shift register, with inadequate 
output buffer performance that limits operation to 
10% of the frequency capability. 

- In a case like this, choice of output buffer type along 
with a concept such as a windowed shift register (WSR) 
approach [Berg, et al] would allow for a proper operation 
and data capture. 

• Bottom line: know how the testing was done and 
the level of completeness and granularity of data 
captured. 
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Tenet 3: The Analysis 



• The real output of any SEE test campaign is not 
only the ability to determine rates for space 
usage, but also the error signatures of the events. 

• This is the key to understanding the risk beyond 
the SEE rates and to provide the system 
designers the information to properly design 
mitigation or fault tolerance approaches. 

• The simple way of viewing this is that all SEU 
events that cause SEFIs are not created equal: 

- They have different circuit responses and capturing and 
diagnosing them can be a challenge. 

- One SEFI may change the operating mode, while 
another may cause a current increase. 
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Diatribe 2: Limiting cross-sections (1) 



• The theory is pretty straightforward: 

- Just because an event is not observed during a SEE test 
run doesn’t ruie out the potential that the next particie 
will cause the event (or a different event). 

• SEE is known to be a Markov process in that past 
performance is not necessariiy an indicator what 
happens next. 

- One then assumes that the next particle will cause an 
event. 

- The limiting cross-section is then usually designated as 
1/(fluence of the test run, i.e., the total number of 
particles/cm^ accumulated during that run). 
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Diatribe 2: Limiting cross-sections (2) 

• A simple example was documented in 1998 by 
LaBel, et al. 

- Proton SEE tests were performed with a sample size of 
3 and a proton test fluence of 1xl0^“ p/cm^ 

- A specific SEFI condition was not observed (row/column 
errors). 

- The project utilizing this device did not understand that 
this implied a limiting cross-section, as opposed to a 
zero cross-section or immunity to the effect. 

- They flew 1000 samples of this device and observed this 
SEFI in flight. 

- A re-test with 100 parts and a higher proton fluence 
confirmed this rare event and device sensitivity. 

In cases of billion transistor devices, the probability of stimulating all 
possible error signatures is statistically zero for a typical test 
campaign. Thus, the best we can try to do is provide the limits for 

other error types not observed 

To be presented by Kenneth A. LaBel at the 201 5 Government Microcircuits Applications & Critical Technologies Conference (GOMAC) Tech, 

St. Louis, MO, March 23-26, 201 5. ^ ' 


Caveat Emptor! 

• The figure below is from the 1992 IEEE Radiation Effects Data 
Workshop (REDW) record (LaBel, et al.) 

• To summarize what was presented, 

- A system level test of an INTEL 80386 processor and several 
peripherals was performed at Brookhaven National Laboratories 
(BNL) Single Event Upset Test Facility (SEUTF). 

- The data were for a representative test run and interpreted as 
“microlatchup” - a series of SEL events that caused a step-like 
increase with each event in the power supply current 
consumption. 

- However, the device continued to function during the test run 

even with the increases. 
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Mea Culpa! 



• Realistically, more diagnostics were needed to 
determine if these really were SEL events and 
not possibly caused by SEU hits to hidden logic 
or bus contention or another SEFI event. 

• Even over twenty years ago, device complexity 
and understanding should have been better 
explored. 


wrong 

answer 

try again... 


To be presented by Kenneth A. LaBel at the 201 5 Government Microcircuits Applications & Critical Technologies Conference (GOMAC) Tech 
St. Louis, MO, March 23-26, 2015. 


19 



Discussion 

• The realistic implications are different depending on 
whether the device is commercial-off-the-shelf (COTS) or 
custom-designed for radiation tolerance.. 

- For COTS, you will be dealing with unknowns and limitations, 
hence capturing as many error signatures as possible 
provides the most useful information. 

* It’s what the designers need to build appropriate mitigation into 
their systems. 

- For custom design, you should be able to predict error 
signatures as long as there aren’t intellectual property (IP) 
blocks of unknown design (black box designs). 

* Thus, tests here are usually more about statistics to meet SEU 
rates or threshold levels. 

* That is, unless unexpected SEFI events occur. 

• Devices like FPGAs are afflicted with both implications: 

- Custom designs are created, but there’s also manufacturer-embedded 
IP and hidden functions that require detailed error signature capture. 

- Double the challengel 
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Summary 



• While far from a complete treatise on testing, we 
have provided some caveats in reviewing a 
device’s SEE performance based on collected 
data. 

• The level of understanding of the device’s 

internal workings as well as the limitations of the 

test setup, allow proper risk-based analyses to be 

performed on the collected SEE data. 

- It’s not just event rates, but event signatures and 
interpretation! 
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